OUTLIERS IN THE SPECTRUM OF IID MATRICES WITH BOUNDED RANK 

PERTURBATIONS 
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Abstract. It is known that if one perturbs a large iid random matrix by a bounded rank error, then the 
majority of the eigenvalues will remain distributed according to the circular law. However, the bounded rank 
perturbation may also create one or more outlier eigenvalues. We show that if the perturbation is small, 
then the outlier eigenvalues are created next to the outlier eigenvalues of the bounded rank perturbation; 
but if the perturbation is large, then many more outliers can be created, and their law is governed by the 
zeroes of a random Laurent series with Gaussian coefficients. Gn the other hand, these outliers may be 
eliminated by enforcing a row sum condition on the final matrix. 



1. Introduction 



This paper is concerned with the study of outhers of the circular law for iid random matrices and its 
variants. To recall this law, we make some definitions: 

Definition 1.1 (iid random matrix). An iid random matrix is an nx 71 random matrix = {xij)i<ij<n (or 
more precisely, a nested sequence Xi, X2, ... of such matrices) whose entries Xij for i,j > 1 are independent 
identically distributed complex entries, which we normalise to have mean zero and variance one. We say 
that such a matrix has atom distribution x if all the Xij have distribution x, thus Ex = and E|xp — 1. 

Definition 1.2 (ESD). Given an n x n complex matrix An (not necessarily Hermitian or normal), we define 
the empirical spectral distribution iia^ of to be the probability measure 



1 " 
n ^-^ 

where Xj = Xj{An) for 7 = I, . . . , n are the eigenvalues of An (counting multiplicity, and ordered arbitrarily). 



n ^-^ 



If An is a random n x n complex matrix (so that fj,A„ is also random), we say that 11 a„ converges in 
probability (resp. almost surely) to another (Borel) probability measure fj, on the complex plane C if for 
every smooth, compactly supported function : C — ^ C, /j, d/iA^ converges in probability (resp. almost 
surely) to F rf/i. 

The following theorem is the culmination of the work of many authors , [35] , [23] , [3] , [S] , [H] , [IS] , [33] , 

m. m- 

Tiieorem 1.3 (Circular law for iid matrices). Let Xn be an iid random matrix. Then ^i_i^Xn converges 
almost surely (and hence also in probability) to the circular measure fic, where dfic ■= ^l|i|<i dz. 

The result as stated is [SI Theorem 1.15], but this result is based on a large number of partial results 
(in which more hypotheses are placed on the atom distribution x) which are proven in the previously cited 
papers. 



T. Tao is is supported by a grant from the MacArthur Foundation, by NSF grant DMS-0649473, and by the NSF Waterman 
award. 



The circular law implies in particular that the spectral radius 



p{-^Xn) = lim 



l/k 



op 



sup 

l<j<n 



is at least 1 — o(l) almost surely, where o(l) goes to zero as n — > oo. When the atom distribution x has finite 
fourth moment, we in fact have an asymptotic for the spectral radius: 

Theorem 1.4 (No outliers for iid matrices). Let X.^ he an iid random matrix whose atom distribution x 
has finite fourth moment: Elxl** < oo. Then p[^=Xn) converges to 1 almost surely (and hence also in 
probability) as n oo. In fact, for any finite to > 1, || (:^^n)™llop converges to m + 1 almost surely (and 
hence also in probability) as n oo. 



Furthermore, if all moments of x are finite, one has the tail bound 



1 



< TO + 1 + e 



with overwhelming probabilit-^for each e > 0. 



op 



Proof. This follows from what is by now a routine application of the truncation method and the moment 
method; see |TD], [22], or [21 Theorem 5.17]. We remark that the tail bound can also be deduced from 
the main result using the Talagrand concentration inequality (after first truncating to the case when x is 
bounded); see [2S], [T], [32]. The precise expression to + 1 is not important for our arguments here; any 
quantity that was subexponential in to would have sufficed. □ 



Informally, Theorem 1 1 . 4| asserts that when the fourth moment is finite, there are no significant outliers to 
the circular law: with probabilitjj^ 1 — o(l), all of the eigenvalues of the matrix -^Xn lie within o(l) of the 
support {z € C : |z| < 1} of the circular law. The fourth moment condition here is necessary for the second 



conclusion of Theorem 1.4 (see |9j), and it is very likely that it is also necessary for the first conclusion. 



Now we consider the circular law and its outliers for random matrices formed as a low-rank perturbation 
of an iid random matrix. The circular law is stable under such perturbations: 

Theorem 1.5 (Circular law for low rank perturbations of iid matrices). [H] Corollary 1.17] Let Xn be an 

iid random matrix, and for each n, let C„ be a deterministic matrix with rank o{n) obeying the Frobenius 
norm bound 

\\CJf := (trace C„C:)i/2 ^ o{n'^^). 
Then fi^x +c converges both in probability and in the almost sure sense to the circular measure /ic- 

Remark 1.6. Thanks to a recent result of Bordenave [16], the 0(n^/^) bound here can be relaxed to 0{n'^'^^'>). 



However, the low rank perturbation C„ can now create outliers. Our first main result is to describe these 
outliers in the case when C„ has bounded rank and bounded operator norm, and x has finite fourth moment. 
In this case, it turns out that the outliers of -j=X„ + C„ are close to those of C„. More precisely, we have 

Theorem 1.7 (Outliers for small low rank perturbations of iid matrices). Let X^ be an iid random matrix 
whose atom distribtuion has finite fourth moment, and for each n, let Cn be a deterministic matrix with 
rank 0(1) and operator norm 0(1). Let e > 0, and suppose that for all sufficiently large n, there are no 
eigenvalues of Cn in the band {z d C : 1 + e < \z\ < 1 + 3e}, and there are j eigenvalues Xi{Cn), Xj{Cn) 
for some j = 0(1) in the region {z G C : |z| > l + Se}. Then, almost surely, for sufficiently large n, there are 

^We say that an event En depending on n occurs with overwhelming probability if for every A > there exists C > such 
that P{E„) > 1 - Cn-^ for all n. 

^The asymptotic notation 0(),o() that wc use here will be defined in Section f.3 

2 



Figure 1. This figure shows the eigenvalues of three 200 by 200 iid random matrices with 
atom distribution x defined by P(a; = 1) = P(a; = —1) = 1/2, each of which was perturbed 
by adding the matrix diag(2 + i, 3, 2, 0, 0, . . . , 0). The small circles are centered at 2 + i, 2, 
and 3, respectively, and each have radius n^^^^ where n = 200. (Figure by Phillip Wood.) 



precisely j eigenvalues Ai(^X„ + C„), . . . , \j{^Xn + C„) of ^Xn+Cn in the region {z £ C : \z\ > l+2e}, 
and after labeling these eigenvalues properly, Ai(^X„ + C„) — Ai(C„) + o(l) as n oo for each 1 < i < J. 

Thus, for instance, if one perturbs -^X^ by a bounded rank, bounded operator norm matrix C„ whose 
eigenvalues all lie inside the unit disk D := {z G C : |z| < 1} (e.g. C„ could be a nilpotent matrix), then no 
outliers are created; but once C„ has eigenvalues leaving the unit disk, the perturbed matrix -^Xn + C„ 
will also have outliers in asymptotically the same location. Theorem |1.7| is illustrated in Figures [T] [2j 

Remark 1.8. An analogous result for Wigner matrices instead of iid matrices has recently been established in 
[18j . |37j . with more precise control (in particular, a central limit theorem) on the distribution of the outlier 
eigenvalues; the methods used are somewhat different, but the techniques developed here can be adapted to 
the Wigner case (Alexander Soshnikov, private communication). See also [H], [T3] for further results in the 
Wigner case, whose methods are close to those used here, [3S] for a treatment of the GUE case, [TT] for a 
treatment of the LUE case, and [S], [T^], [Z| for a treatment of the covariance matrix case. Interestingly, in 
the Wigner case the outlier eigenvalues Xi{^Xn + C„) of the perturbed matrix are not close to the outlier 

eigenvalues Ai(C„) of the original matrix, but rather to the shifted eigenvalues Ai(C„) + x~^T7' where is 
the variance of the entries of the Wigner matrix a. This is ultimately because the powers (^X„)™ have a 
significant presence on the diagonal in the Wigner case, in contrast with the iid case where all entries are 
small. Alternatively: the Wigner semicircular law has nonzero moments, while all nontrivial (pure) moments 
of the circular law vanish. 



Figure 2. This figure shows the eigenvalues of a single 1000 by 1000 iid random matrix 
with atom distribution x defined byP(a; = l) = P(a; = — l) = l/2 which was perturbed by 
adding the matrix diag(2 + 1, 3, 2, 0, 0, . . . , 0). The small circles are centered at 2 + i, 2, and 
3, respectively, and each have radius n^^/"* where n ~ 1000. (Figure by Phillip Wood.) 



Theorem 1.7 is proven in Section [2l The main tools are asymptotics of Stieltjes transforms outside of the 



unit disk D, combined with the fundamental matrix identitjl^ 

(1) Aet{l + AB)=dci{l + BA) 

valid for arbitrary n x fc matrices A and kxn matrices B. Note that the left-hand side is an n x n determinant, 
while the right-hand side is a A: x A; determinant. For low rank perturbations, we will be able to apply ([!]) 
with k bounded and n going to infinity, allowing one to transform an unbounded-dimensional problem into 
a finite-dimensional one. 



Theorem 1.7 only deals with perturbations that are relatively small, having an operator norm of 0(1). 
It is also of interest to consider larger perturbations, such as those caused by adjusting the mean of each 
coefficient of Xn by 0(1). Here, the situation is more complicated, and we will consider only a few model 
perturbations, rather than attempt to obtain the most general result. 

We first consider the case of iid matrices with non-zero mean, which we write as -^Xn + i.i\/n(j)n(l>n, where 
/i is a fixed complex number (independent of n) and (f>n is the unit column vector 0„ :— :^(1, • • • , 1)*; this 
corresponds to shifting the atom distribution a; by /i (so that it has mean fi rather than mean zero). This 
is a rank one perturbation of -^X^. The circular law still holds for this ensemble, thanks to Theorem 1.5 
(or the earlier result of Chafai [19j). However, in view of Theorem |1.7[ we expect a single large outlier near 
fi\/n. This is indeed the case: 

Theorem 1.9 (Outlier for iid matrices with nonzero mean). Let X„ be an iid random matrix whose atom 
distribution has finite fourth moment, and let n G C be a non-zero quantity independent of n. Then almost 
surely, for sufficiently large n, all the eigenvalues of ^X„4-/iy^n0„(/)* lie in the disk {2: G C : |z| < H-o(l)}, 
with a single exception taking the value ^^/n + o(l). 



'We thank Percy Deift for emphasising the importance of this identity in random matrix theory. 



Figure 3. This figure shows the eigenvalues of three 50 by 50 iid random matrices with 
atom distribution x defined byP(a; = 1) = P(a; = — 1) — 1/2, each of which was perturbed 
by adding the matrix iJLy/n(j)n(l)*n where — \. The small circle is centered at -v/SO and has 
radius n~^/'^ where n = 50. (Figure by Phillip Wood.) 



We prove this result in Section [3] One can obtain more precise information on the distribution of this 
exceptional eigenvalue, particularly if one assumes more moment hypotheses on the atom distribution x; see 



PT] , The existence of this exceptional eigenvalue was already noted back in [5]. Theorem 1.9 is illustrated 
in Figure [Sj Figure |4]corresponds to the case of a smaller value of /i, and falls instead under the regime 
covered by Theorem |l.7[ 

Next, we consider a model that was introduced in |39| . in the context of neural networks. In our notation, 
this model takes the form 

(2) Ar, := + mB„ 

where /i > is a fixed parameter, and i?„ is a random matrix (independent of X„) such that the columns of 
Bn are iid, with each column equal to ^J~^^^(j)n with probability p and — \J iz^4>n with probability 1 — p, for 

some fixed < p < 1. (In the notation of [39', the excitatory mean is /i-^ (1 — p)lp and the inhibitory 
mean is —y^\/pl (1 — p), and the excitatory and inhibitory variances are assumed to be equal.) Note that 
one can write 

(3) An -^Xn + \x^fn(^ni>n 

where '0„ is a random vector whose entries are iid and equal ^ \p^^ with probability p and — ^ \J 
with probability 1 — p; with this normalisation, ■(/;„ has mean zero and unit variance. 



Again, by Theorem 1.5 the ESD of An is governed by the circular distribution p,c in the limit n 



however, as observed numerically in ^39], a small number of outliers also appear for A^-, 
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Figure 4. This figure shows the eigenvalues of a single 1000 by 1000 iid random matrix 
with atom distribution x defined by P(a; = 1) = P(2: = — 1) = 1/2 which was perturbed by 
adding the matrix fi^/nipncf)^ where /i = 2/y/TOOO. The small circle is centered at 2 and has 
radius where 7i = 1000. (Figure by Phillip Wood.) 



It is possible to explain the outliers by the arguments of this paper. However, in contrast to the situations 



in Theorem 1.7 or Theorem 1.9 in which the outliers essentially have a deterministic location up to o(l) 
errors, for the model ([2]), the outliers retain significant randomness at macroscopic scales, and need to be 
modeled by a point process rather than by a deterministic law. To describe this point process, we introduce 
the k-point correlation functions : C*^ — > for the ESD of An for 1 < k < n, defined as the unique 
symmetric function such that 

F{zi,...,Zk)p^^l{zi,...,Zk) d?Zi...d^Zk = F{\,,{An),---.KMn)) 

ii,...,ifce{i,...,n}, distinct 

for all continuous, compactly supported test functions F : 'C^ — > C; note that the right-hand side is not 
dependent on how one orders the n eigenvalues of An- Here, dPz denotes two-dimensional Lebcsgue measure 
on C. If the atom distribution of An is discrete, then p\ needs to be interpreted as a distribution or measure 
rather than as a function, but this technicality will not concern us here. 

It turns out that the correlation functions p^'' have a limiting law outside of the unit disk D (inside the 
disk, one expects these functions to go to infinity, thanks to the circular law and the choice of normalisation) . 
We do not have a completely explicit formula for this limit, but can describe it instead as the zeroes of a 
random Laurent series. More precisely, consider the random Laurent series 

oo 

,(z) :=1-A^E| 

where gi,g2,.. - are iid copies of the real Gaussian distribution A^(0, 1)r. From the Borel-Cantelli lemma 
we see that this Laurent series is almost surely convergent in the complement C\D of the closed unit disk 
D, and almost surely has a finite number of zeroes in the region {z e C : |z| > 1 -I- e} for any fixed e > 0. 
We then define the limiting correlation function p^"^ : (C\D)'^ M+ outside of this disk as the unique 
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symmetric function (or more precisely, distribution) such that 

(4) / F{zi,...,Zk)p^^\zi,...,Zk) (fzi...(fzk = V F{wi,...,Wk) 

Jck k\ ^ 



Wi,...,Wk 



,eAg, distinct 



for all continuous, compactly supported test functions F : (C\D)'^ — ^ C, where are the zeroes of g 
(counting multiplicity); a little more explicitly, p^\zi, . . . , Zk) can be defined for distinct zi, . . . , Zk G C\D 



as 



i^Hz„...,Zk) = \im /- r!--'"^ 



where pi^^'^(zi, . . . , z^) is the probability of the event that there is a zero of g within e of Zj for each 
J = 1, . . . 

Remark 1.10. If the gi, §2, ■ ■ ■ were complex Gaussian N{0^ l)c instead of real, we normalised p = 1, and we 
replaced the constant coefficient 1 by another complex gaussian go, then g(z) would be a Gaussian power 
series (GPS) in the variable 1/z. Gaussian power series have been intensively studied (see the recent text [29] 
and the references therein). In that case, the fc-point correlation functions are given by the determinantal 
formula 

Pcpsi^i' ■ • ■ , 2fe) = — r det(- =)i<i,j<k; 

n'^ 1 — ZiZj - 

see [36] . One may then hope that a somewhat analogous formula might be obtained for the random Laurent 
series considered here, possibly using the explicit formulae for the correlation functions of zeroes of real 
random polynomials from |38j as a starting point. We will not pursue this matter. 



We can now state our main theorem regarding this model, which we prove in Section [5] 

Theorem 1.11 (Limiting law). Let Xn be an iid random matrix whose atom distribution x is real-valued 
and which is either gaussian (i.e. x = N{0, l)Rj or bounded, and let < p < 1 and p > be fixed. Let An, 
p[^l, and p^^ be defined as above. 

(i) (Crude upper bound) For any e > 0, let denote the number of eigenvalues of An in the region 
{z g r2 : |z| > 1 + e} (counting multiplicity). Then sup„ EA^™ < oo for all e > and m > 1. 

(ii) (Limiting law) converges in the vague topology to p^Jc on (C\D)'^. In other words, one has 

F{zi,...,Zk)p^^\zi,...,Zk) d^zi...d^Zk I F{zi, . . . , Zk)p^^ [zi, . . . , Zk) d^zi...d^Zk 

whenever F : (C\D)'^ — > C is continuous and compactly supported (in particular, it is supported in 
the region {(zi, . . . , Zk) G C'^ : |zi|, . . . , |z/c| > 1 + e} for some e > Q). In particular, the limiting 
distribution is universal with respect to the distribution x. 

Remark 1.12. The requirement that all the coefficients of X„ are real is a natural one from the neural net 
application[35]. However, in view of the better developed theory for complex Gaussian power series[5^, it 
may in fact be more natural from a theoretical perspective to consider the case when the Xn are complex 
valued, e.g. if the atom distribution is complex Gaussian. In that case, there is a similar result to Theorem 
but with the coefficients gj of the random Laurent series g{z) given by complex Gaussians rather than 



1.11 



real Gaussians; we omit the details. The requirement that x be either Gaussian or bounded is a technical 
one, so that one may apply concentration inequalities; it may certainly be relaxed substantially. 



Theorem |1.11| is illustrated in Figure [5] 
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Figure 5. This figure shows the eigenvalues of a single 1000 by 1000 iid random matrix 
with atom distribution x defined byP(a; = l) = P(x = — l) = l/2 which was perturbed by 
adding the random matrix as defined after ([2|, where fi — 2 and p = 1/4. (Figure by 
Phillip Wood.) 



1.1. Zero-sum matrices. Next, we consider a different low-rank perturbation of an iid matrix model, in 
which the row sums of the matrix are forced to equal zero. Introduce the orthogonal projection matrix 



1 - (t>n4'n = (<5i. 



/ l<2,j<n 5 



thus Pn is the orthogonal projection onto the hyperplane = {(xi, . . . , a;„) e C" : xi + . . . + Xn ^ 0}. 

Our first result is that the presence of this projection does not affect the circular law, nor does it create 
any outliers: 

Theorem 1.13. Let Xn be an iid random matrix. Then iJ-^x P converges both in probability and almost 
surely to the circular measure fic, o,nd almost surely one has p{A^XnPn) = 1 + o(l). 



We prove this theorem in Section [6] using the machinery from [H]. The main difficulty is to ensure that 
the least singular value of -^XnPn — z is well controlled for any fixed z, but this can be achieved dropping 
one dimension (and freezing some of the entries of Xn) to eliminate the role of P„ (at the cost of replacing 
the deterministic matrix z — zl hy a, more complicated matrix). This result is somewhat analogous to the 
result of [17' establishing the circular law for random Markov matrices (under an additional bounded density 
hypothesis on the atom distribution), although the situation is more complicated in that case because a 
slightly nonlinear transformation is required in order to convert a iid random matrix to a Markov matrix, in 
contrast to the simple linear transformation X„ i— j. X„P„ required to make the rows sum to zero. 



Suppose that tpn is a vector orthogonal to then "0* = V'n-fn and Pn4'n — 0. Then from two applications 
of we have 

det ( 1 + z(^X„P„ + Ml)] = det ( 1 + 2P„(^X„P„ + (/)„V;) 



'\/n 



'n 



= det ( l + zP„^X„P„ 
Jn 



det ( 1 + Z^XnPn 

Jn 



We thus see that ^X„P„ and -^XnPn + (pnfpn have the same characteristic polynomial, and thus the same 
ESD and the same spectral radius. (One can also establish these facts directly without much difficulty, as 
was done in [35].) We thus obtain 

Corollary 1.14. Let Xn be an iid random matrix, and for each n, let ipn be a (possibly random, and X„- 
dependent) vector which is orthogonal to (/)„. Then ^i^XnP„+(l>„jp' converges both in probability and almost 
surely to the circular measure fic, o,nd almost surely one has p{ A=XnPn + 4'n4'n) = 1 + o(l). 



In particular, no outliers are created no matter how large is (or how aligned it is with </>„). These results 
were established in 39J in the gaussian case. This is in sharp contrast to the situation in Theorem 1 1 . 1 1| for 
the model ([s]), which is similar to the matrix ^X„P„ + 0ri'0n, but without the P„ projection. 

Remark 1.15. Our results here are only effective in the region where the spectral parameter z has magnitude 
larger than that of the spectral radius. It would be of interest to determine what happens in models where the 
eigenvalue law of the base matrix -^^^.X^ is not governed by a circular law, but by another law whose support 
does not occupy the entire disk given by the spectral radius (e.g. matrices whose ESD is concentrated in an 
annulus). In the covariance matrix case, results in this direction appear in [S], [7]. 
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ematics. We thank Larry Abbott for raising these questions. After posing this question at the workshop, 
Alice Guionnet provided the key insight, namely to reduce matters to studying coefhcients of the resolvent 
of -^Xn, while Percy Deift emphasised the significance of the identity ([T]) to questions of this type (and 
indeed, this identity is crucial in order to efficiently handle the higher rank case fc > 1). The author also 
thanks Sasha Soshnikov and Phillip Wood for useful discussions, Florent Benaych-Georges, Djalil Chafai 
and Raj Rao for references, and Phillip Wood for corrections. We are also indebted to Phillip Wood for 
supplying the figures for this paper. Finally, we thank the anonymous referees for many helpful comments, 
corrections, and references. 



1.3. Asymptotic notation. Throughout this paper, n is an asymptotic parameter going to infinity. We 
use o(l) to denote any quantity that is bounded in magnitude by an expression c(n) that goes to zero as 
n — oo, keeping all other parameters independent of n (e.g. x, e, k, z) fixed. Similarly, we use X — OiY) or 
X to denote the estimate |^| < CY where the implied constant C is independent of n but may depend 
on on parameters independent of n. 



2. Small low rank perturbation 



We now prove Theorem |1.7[ Fix e > and x, C„ as in that theorem. By hypothesis, C„ has rank at most 
k for some k = 0(1) independent of n, and an operator norm of 0(1). By the singular value decomposition, 
we can write Cn — AnBn for some n x k and k x n matrices A„, P„, both of operator norm 0(1). 
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We have the following descriptiorj^ of the eigenvalues of + Cn in terms of a A; x fc determinant: 

Lemma 2.1 (Eigenvalue criterion). Let z be a complex number that is not an eigenvalue of -^Xn. Then z 
is an eigenvalue of -^Xn + C„ if and only if 

(5) det(l + B„(^X„ - z)-^Ar,) = 0. 

V"- 

Proof. Clearly, z is an eigenvalue of -^Xn + C„ if and only if 

det(^X„ + C„ - z) = 0. 

By hypothesis, -^Xn — z is invertible and C„ = yl„_B„, so we may rewrite this equation as 

det(l + (^X„ - z)-M„S„) = 0. 
V"- 

The claim now follows from ([T]). □ 

Remark 2.2. The above argument in fact shows that 

1 det(^X„ + C„ - z) 

det(l + S„(^X„ - z)-iA„) = /" 1 ^ — 

V" det(^X„-z) 

n;=i(A,(^xj-z) 

whenever the denominator is nonzero. We are indebted to Alice Guionnet for suggesting the use of this type 
of criterion, versions of which also appear in [3], [8], [18], [13], [M], [15]. For instance, the fc = 1 case of this 
criterion appears explicitly in [13". In [J the expression in this identity (which is stated there in the case 
of symmetric matrices) is referred to as the modified Weinstein determinant. (We thank Raj Rao for this 
reference.) 

Introduce the functions 

/(z) := dct(l + B„(^X„ - z)-^A„) 
g{z) :=det(l + B„(-z)-iA„). 

These are both meromorphic functions that are asymptotically equal to 1 at infinity, with g being a rational 
function of degree at most k with bounded coefficients. Lemma [2. 1| tells us that outside of the spectrum of 
-^Xn, the zeroes of f{z) agree with the eigenvalues of -^^n + Cn- An inspection of the argument also 
reveals that the multiplicity of a given such eigenvalue is equal to the degree of the corresponding zero of /. 



Similarly, replacing -^Xn by the zero matrix in Lemma 2.1 we sec that outside of the origin, the zeroes of 
g are precisely the eigenvalues of C„ (counting multiplicity) . Indeed, from ([T]) one has 

,(z)=n(i-^ 

where Ai(C„), . . . , Afc(Cn) are the k non-trivial eigenvalues of C„ (some of which may be zero), including of 
course the j eigenvalues -j=Xn of magnitude at least 1 + 3£. 



We arc indebted to Alice Guionnet for proposing the k = 1 case of this formula, as well as the basic strategy of proof used 
in this paper, and Percy Deift for emphasising the importance of the identity |l]|. 
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By Theorem 1.4 we see that almost surely, the spectrum of ^-'^n is contamcd in the disk {z G C : |z| < 
1 + e} for sufficiently large n. In view of Rouche's theorem (or the argument principle), together with the 
fact that the coefficients of g are bounded, it then sufhces to show that the quantity 

sup \f{z)-g{z)\ 

\z\>l+2e 

converges almost surely to zero. Since k is fixed and i?„,A„ are bounded in operator norm, it suffices to 
show that 

sup ||B„((^X„ - - {-z)-^)A^\\op 

|z|>l+2e V" 

converges almost surely to zero. 
By Theorem 1 1.4[ we almost surely have 



= m + 1 + o(l) 



for each m > 1. In particular, there exists mo > such that 



<{i + sy 



op 



for sufficiently large n, which also implies that one has 

1 



< K{l + ey 



op 



for all n, m > 1 and some almost surely finite random variable K. This ensures that the Neumann series 



--Xr 



is absolutely convergent in the operator norm, uniformly in both n and z, when n is sufficiently large and 
\z\ > 1 + 2e. By the dominated convergence theorem, it thus suffices to show that the k x k matrix 

Bn{~}=Xn)"^'' An 

converges almost surely to zero for each fixed m > 1. Breaking _B„ and into components, it suffices to 
show the following claim: 

Lemma 2.3 (Coefhcient bound). Let X„ he an iid random matrix whose atom distribution has finite fourth 
moment. Then 



(6) 



1 



((-=X„)"m„,«„) -o(l) 



almost surely for each fixed m > I and any fixed ( deterministic) sequence of unit vectors UmVn G C" . 



We now prove the lemma. 



The atom distribution x is currently only assumed to have finite fourth moment. However, a standard 
truncation argument (using the results of ,9] to control the contribution of the tail of x) shows that we may 
almost surely approximate -^Xn to arbitrary accuracy in operator norm by an iid matrix in which the atom 
distribution is in fact bounded. As such, it will suffice to prove the lemma under the additional assumption 
that X is bounded. In particular, all moments of x are now finite: Ejxp — 0{l) for all fixed j. 

By diagonalising the covariance matrix of Re(a;) and Im(a;), we may assume (after a phase rotation) that 
Re(a::) and Im(z) have zero covariance, and have variances and 1 — respectively for some < < 1. 
Next, by splitting into real and imaginary parts and renormalising, we may assume without loss of 

generality that u„,u„ have real coefficients. 
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We now use the second moment method. It wih suffice to show that 

We first deal with the model case when x is normally distributed with the the normal distribution A^(0, cr^)R + 
iN{0, 1 — cr^)R (with the real and imaginary parts independent). As is well known, in this case the ensemble 
G„ is invariant under left and right multiplication by orthogonal matrices. This implies that the expression 
E|((^G„)'"w„, Wn)P is in fact independent of the choice of the real unit vectors u„, Vn- Letting u„, Vn range 
over the standard basis ei, . . . , e„ and averaging, we conclude that 

E|((^G„)™M„,«„)p = ^Etrace(^G„)™((^G„)")* 
< -E||^G„||". 



By Theorem 1.4 we thus obtain a bound of 0(l/n) = o(l) in this case. 



To handle the non-gaussian case, it thus suffices to show that 

E|((^X„ru„,i;„)|2 - E|((^G„)™««,««)I' = 0(1). 
V ^ y n 

We expand the left-hand side as 

— ra 

^ / J '^n,aQ ^n,am ^n,feo '^n^b-m 

aQ , ... ^ajn -,^0 .... ,bjjT_^{l , ... .n} 

m \ 
-^aj ,aj_i ~ 9aj ,aj -i9bj ,bj-i 1 : 

i=i J 

where Un^i,Vn^i are the coordinates of the unit vectors u„,f„, and gij are iid copies of the complex normal 
distribution 7V(0, cr^)^ + iiv(0, 1 - cr^)^. 

Consider the collection of 2m ordered pairs {aj_i, Uj), (6j-i, bj) with j — 1, ... ,111. From the iid nature of 
X, and the fact that the random variables x and g match up to second order, we see that each summand in 
the above expression vanishes unless each ordered pair appears with multiplicity at least two, and at least 
one pair appears at least three times; in particular, there are at most m — 1 distinct ordered pairs. As x,g 
have all moments finite, we may thus bound the above expression in magnitude by 



0{n |u„,Qj|w„,a„||M„,bJ|w„,6„|) 



where the sum ranges over all tuples ao, . . . , a^, ^Oi • ■ • j which generate at most m — 1 distinct ordered 
pairs. 

By the arithmetic mean-geometric mean inequality, we may bound 

\Un,ao\\Vn,a^\\Un,bQ\\v„,b^\ < |M„,ao Hun^bo P + \Vn.a^\'^\u„^bQ\^ + |w„,ao H^'n.fc™ P + {Vn.a^l"^ \Vn,b^\'^ , 

thus bounding the previous expression by the sum of four terms, of which 

(7) ©(n—^ nv.„n 

is typical. We will just show that the stated term Q is o(l); the other three terms are treated by a similar 
argument. 

A given tuple in the sum has at most m — 1 distinct ordered pairs, which then gives rise to an 
unordered, looped graph on {1, . . . ,n} that is the union of two paths of length m, and thus either has one 
or two connected components (excluding vertices with no edges). Suppose first that the graph has one 
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connected component. Then, after fixing oq, there are only 0{n™ ^) possible ways to choose the remaining 
components of the tuple; bounding w„,6„ crudely by 0(1), we see that the contribution of this case to ([t]) is 



O l^n-™ Wn,ao\^n"'-' ] - O(n-i) = o(l) 

which is acceptable. 



ao = l 



Now suppose that there are two connected components, then uq, . . . , a^n and bo^ ■ ■ ■ ,bm occupy distinct com- 
ponents. After fixing both oq and 6m, there are at most 0(n™^^) ways to choose the remaining components 
of the tuple; this gives a contribution of 

n n 

E E \^n,ao\^\Vn,bj'n"'-') = 0{n-') = o{l) 
ao — l bm — 1 



which is again acceptable. This completes the proof of Lemma 2.3 and hence Theorem |1.7[ 

Remark 2.4. One could also use concentration of measure inequalities (see e.g. |30j ) to establish Lemma 
|2.3[ but the moment methods used above will be useful when we establish a central limit theorem variant of 
Lemma [2731 in Section |4] below. 



3. Large mean 



Now we prove Theorem 1.9 It suffices to show that for any fixed e > 0, almost surely one has exactly one 
eigenvalue of -^Xn + n^/n(j)n(t>n outside of the disk {2 : jz] < 1 + 2e}, with this eigenvalue occuring within 
0{e) of fiy/n. 



Fix e. By Theorem 1.4 almost surely there are no eigenvalues of -^Xn outside of the disk. By Lemma 



2.1 



the eigenvalues of ^ X„ + /i-y/n0„0* outside this disk are then precisely the solutions (counting multiplicity) 
to the equation f{z) — 0, where / is the meromorphic bmction 

f{z) := 1 + fly^.(^{^Xn ~ 2:)"V«, '/'ri^ • 

By Neumann series, we may expand 

00 , 

f{z) = g{z) -fiV^Y. ( 0. 

m=l ^ V 

where 

(8) giz):^l-fiV^/z. 
From Thcorem |1.4[ wc almost surely have 



(say) and 



< 3 



op 



<(! + £) 



mo 



for some fixed integer toq depending only on s, and all sufficiently large n; this gives us the truncated Taylor 
expansion 



M , 

f{z) = g{z) -fiVnY ( (^^n) 



\A4 



for any M > 1 and |z| > 1 + 2e, where the impHed constant is allowed to depend on e,fi but not on M. 



Applying Lemma 2.3 we almost surely obtain 



for any fixed M uniformly for all > 1 + 2e, and thus (by letting M go slowly to infinity) we almost surely 
have 



fiz) = 9{z) + o 



uniformly for all |z| > 1 + 2e. From this and ^ we see that / has no zeroes in this region except within a 
o(l) neighbourhood of /i\/n, and from Rouche's theorem we see that there is exactly one zero of that latter 
type when n is sufhciently large, and the claim follows. 

Remark 3.1. The eigenvector corresponding to the exceptional zero can be explicitly described, by observing 
the identity 

i-l=Xn + nVn(l)n(l)*n - z){l ^X„)" Vn = '^^^V n 

yn Zy/n z 

for all non-zero z outside of the spectrum of ^X„. In particular, if z = fiy/n + o{l) is the outlier eigenvalue, 
the (1 — ■^^Xn)~^4'n is the corresponding eigenvector. From Theorem 1.4 and Neumann series, we see that 
almost surely, this eigenvector lies within 0{l/^/ri) of (pn in norm, and a more accurate description of this 
eigenvector can be given by expanding out the Neumann series further. 



4. A CENTRAL LIMIT THEOREM 



In Lemma 



2.3 



we showed that the coefficients ((-i=X„)™u„, u„) decayed almost surely to zero. Now we 
prove a more refined statement on the rate of decay, which is to Lemma |2.3| as the central limit theorem is 
to the (strong) law of large numbers. This result will be needed to prove Theorem |1.11[ For simplicity we 
consider only real-valued matrices; there is a complex analogue when the real and imaginary parts of x have 
the same covariance matrix as the complex Gaussian iV(0, l)c, but we will not state it here. 

Proposition 4.1 (Central limit theorem). Let be an iid random matrix whose atom distribution is real- 
valued, has mean zeero and has all moments finite. Let Un,Vn G K" be a (deterministic) sequence of unit 
vectors whose coefficients Un,i,Vn,i are asymptotically delocalised in the sense that 

(9) sup |u„,j|, sup \vn,i\^ 0{l/^/n). 

l<i<n l<i<n 

Then for any fixed m > 1, the m random variables 

1 



(10) Zj := y/n(^{—^XnyUn,v, 

for j — 1, . . . ,m converge jointly in distribution to the law of m independent copies of the real gaussian 
Af(0,l)K. 

We now prove this proposition. By (the multidimensional version of) Carleman's theorem (see e.g. [B]), it 
suffices to prove the moment bounds 

m m 

(11) En^?=EnG;^+«(i) 

for any natural numbers ri, . . . , r™, where Gi, . . . , Gm are iid copies of the real gaussian A'^(0, 1)r. 
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Fix m, ri, . . . , r^; we allow all implied constants to depend on these quantities. The left-hand side can be 
expanded as 

m rj Tj j 

* j=i 1=1 i=i 

where U.^ := y/nUn,i, := y/nVn,i, 

and * ranges over all tuples of indices aj,i^i G {1, . . . ,n} with 1 < j < m, 1 < I < rj, < i < j. 



From ([9]) and the bounded moment hypotheses, we see that each summand (12 1 is of size 0(1), and so we 
may freely ignore up to o{n^) summands whenever desired. 

For each tuple in the sum we consider the X^JLii^j ordered pairs (aj./^i_i, Oj^j^i) with 1 < j < m, 
1 < Z < Tj, 1 < i < j. By the iid and mean zero nature of the Xi,j, we see that this sum vanishes unless each 
ordered pair appears with multiplicity at least two. In particular each of the indices aj,i,i with < z < j 
must occur with multiplicity at least 2, leading to at most A distinct indices. If there are any fewer than A 



distinct indices, then the contribution of this case to (12) is o(n ), so we may assume that there are exactly 



A distinct indices, which implies that each index aj^i,i occurs with multiplicity exactly two. This implies 
that for 1 < i < j , each of the aj^i^i-i arise exactly twice as the initial vertex of an ordered pair, and each 
of the ttj^iA arise exactly twice as the final vertex of an ordered pair. Furthermore, the indices aj,/^o must be 
distinct from the indices a^./^i with < i < j, and similarly the aj^ij must be distinct from the indices ajj^i 
with i < k, as otherwise the fact that each ordered pair appears at least twice will lead to a multiplicity of 
at least three at the repeated index. This implies that the ^ paths (a^^.o, • ■ • i0.j,i.j) ^-re simple paths, 



which each occur with multiplicity two but are otherwise disjoint. The total contribution of this case to ( 12 ) 
is then 



En(nf^'^..,oK,,,,), 

** j=i 1=1 

where the sum X]** over collections of simple paths (ajj^, . . . ,aj^ij) in {l,...,n} which occur with 
multiplicity two but are otherwise disjoint; here we use the fact that each ordered pair appears exactly twice, 
and that x has unit variance. 



In order for all paths to appear with multiplicity two, each of the rj must be even. This already gives ( 11 ) 
when at least one of the rj is odd, so we now assume that the rj are all even. There are Yl 



J ' J '■'■J = l 2'^J'^(rj/2)! 

different ways in which the paths can be matched up to multiplicity two. Once one fixes such a matching, 
there are R := ^ ''^i initial vertices 61, . . . , of paths, and R final vertices ci, . . . , cr, all distinct from 

each other; if one fixes these vertices, then one has (1 + o{l))n'^i=^''-'~^^^^^'^ = (1 + o(l))n'*~^^ ways to 
choose the remaining paths. This gives a total contribution to (fT2|) of 



711 I R 

J=i ^ bi,...,bR,ci,...,crt distincf=i 

We add back in the contributions in which some of the 61, ... , ci, . . . , cu', this only affects the sum by 
o{n^^), which is acceptable. We are left with 



3 = 1 ^ ^' f,i,...,&H,Ci,...,CRe{l,...,n}r=l 

since the Ub and Vc square-sum to n, this simplifies to 



2 . 



' 3- 



3= 

15 



and (111 follows from the standard computation 

EG"' = 

for r even. 



r! 

2'V2(r/2)! 



5. Large non-selfadjoint perturbation 



We now prove Theorem 1.11 It suffices to work in the exterior region VL := {z : \z\ > 1 + 4e} for a fixed 



£ > 0. Henceforth all implied constants may depend on e, p, /z. 

5.1. Crude upper bound. We first show the first part of the theorem, namely that sup„ EA^^^' < co for all 
TO > 1. Fix to; we allow all implied constants to depend on m. Our task is now to show that EiV^J — 0{\) 
for all sufficiently large n. 



From Theorem 



1.4 



: we know that the spectral radius of -^Mn is at most 1+e with overwhelming probability. 
Using the trivial bound N/^^ < n, we see that the tail event when the spectral radius exceeds 1 + e thus gives 
a negligible contribution to EiV|^ and will thus be ignored. 



Conditioning on the event that the spectral radius is at most 1 + e, we may then apply Lemma 2.1 
conclude that the eigenvalues of An in are precisely the zeroes in $1 of the random analytic function 

(13) fiz):=l + tiV^({^Xn-ziy^^n,A 



to 



/n 

In particular, N^^ is the number of zeroes of / in fl. 

The function /(1/z) is analytic in the disk {w C : \w\ < jqrr} and equals 1 at the origin, with zeroes 
in the region {w E C : \w\ < jipj^}- Applying Jensen's formula to this function, we conclude the upper 
bound 

L.'°'- mi ^"'^ 

for any radius r between 1 + 2e and 1 + 3e (note that we allow implied constants to depend on c), where 
log_)_ X :— max(log2:,0) and \dz\ is arclength measure. Averaging, we conclude that 

(14) N^e « / log+ — ^ d^z. 



l+2e<|z|<l+3e 



1/(^)1 



It will thus suffice to establish the bound 

for any fixed to > 1 and any compact subset K of the annulus \z\ > 1 + e (allowing implied constants to 
depend on m,K of course). 



We now pause to regularise the logarithm slightly, as this will come in handy later. By Remark |2.2[ the 
function log is a linear combination of 0{n) terms of the form log |z — zo| for various complex numbers 
zq. As such we have a crude upper bound of the form 

(16) / log™+i d^z < 



K 



1/(^)1 



thanks to the triangle inequality. From this, wc have 
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and so it suffices to sliow that 

E / min(log™ d'z « 1. 

Jk 1/(2)1 
By the Fubini-ToneUi theorem, it thus suffices to show that 

Emin(^log+^,n™+iy" = 0(l) 

uniformly for all z in if. 

It will then suffice to show the following lower tail estimates on /: 

Lemma 5.1 (Lower tail estimates). Let z be in a compact subset K of {z (z C : \z\ > 1 + e}; we allow 
implied constants to depend on K . 

(i) For every ^ > there exists B > ( not depending on n) such that 

P(|/WI<n-^)«n-^. 

(ii) For every 5 > one has 

P{\f{z)\<d)^6 + n-- 

for some absolute constant c > 0. 

Indeed, item (i) (with A ~ 2m) allows one to reduce to the case when log^ \f{z)\ ~ 0{\ogn) (it is here that 
we take advantage of our previous regularisation of the logarithm) , and then (ii) and a dyadic decomposition 
gives the claim. 

Proof. We begin with (i). From ([S]) and ( jlS] ) we have the identity 

(A„ - ZI){^X„ - z/)-V« = fiz)(bn 

and hence 



where an{An — zl) is the least singular value of An — zl. By Theorem 1.4 we have |j -^Xn — zI\\op — 0{1) 
with overwhelming probability. The claim now follows from the least singular value bounds in |441 Lemma 
4.1], im Theorem 2.1], or [33J Theorem 4.1], since we may express An — zl as the sum of the normalised iid 
random matrix -^^Xn and the deterministic matrix ^y/n(j>n'4'n ~ ^d, which has polynomial size. 

Now we prove (ii). Let v be the vector v :— (^X„ — z/)~^0„. It will suffice to show that 

P{V^{v,tlJn) e /) < |/| 

for every interval I. From Theorem 1 1 .4| and Neumann series, we see that with overwhelming probability, 



(17) Wi^Xn - ziyx,, Wi^Xn - z/)||op = o(i) 

and so 

1 < 11^11 < 1. 

Let (5 > be a small quantity (independent of n) to be chosen later. Call v delocalised if we have \vi\ > S/y'n 
for at least 6n of the indices z = 1, . . . , n. As the coefficients of ijjn are iid (and are independent of v), we see 
from the Berry-Esseen theorem (see e.g. PHI Chapter XVI]) that the contribution of the delocalised v are 
acceptable. It thus suffices to show that v is delocalised with probability 1 — 0{n^'^) for some c > 0. 



We will use an epsilon-net argument (cf. [HI], [10] )■ If v is not delocalised, then v lies within 0{S) in 
norm of a vector w of comparable to 1 that is sparse in the sense that it is supported on at most 6n indices, 
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simply by restricting v to those indices i for which \vi\ > S/y^, and truncating all other coefficients to zero. 



From (f7) and the triangle inequality, we thus have 



The number of possible supports for w is at most exp(0((51og | x n)), thanks to Stirling's formula. Once 
one fixes the supports, one can cover the range of w by 0((l/(5))*" balls in i'^ norm of radius S. Thus, by 
moving w by at most 0{5) if necessary, we may assume that w lies in a net E of cardinality 

(18) |E| < exp(0(51og^ X n)). 



For each fixed w, the expression y/n\\{^^Xn — zI)w—(j)n\\ is a convex, 1-Lipschitz function of X„ as measured 

using the Frobenius norm := (trace(A* A))^/^. Applyin^either the Talagrand concentration inequality 

(if X is bounded) or the Levy concentration inequality (if x is Gaussian), we conclude that 



/^\\{^X^ ^ Zl)w ~ (j)n\\ - M 

\ n 



> A < e 



for some absolute constant c > 0, where M is the median value of -yn||(^X„ — zT)w — To compute 
this median, we first compute the second moment 

/ ^ 1 ^ ' 

E \/n||(— =X„ - zl)w - 

Expanding this out and using the fact that X„ is an iid random matrix, this simplifies to 

n(||ziy + 0„||2 + ||u;f). 

In particular, this expression is comparable to n. From the concentration inequality, we conclude that 

P M|(^X„ - zl)w - <?!)„|j < (5 j < exp(-cn) 



if b is small enough. Summing up over all w G E using (181 and the union bound, we obtain the claim. □ 

This concludes the proof of part (i) of Theorem |1.11[ 
5.2. Correlation functions. Now we prove part (ii) of Theorem |1.11[ From part (i), we have the bound 

(19) / p«(zi,...,Zfe)d2^i...d2^, =0(1) 

for each fixed fc. A similar (but simpler) argument also shows that 

(20) / p«(zi,...,Zfe)d2zi...d2^, =0(1) 

(the point being that the Gaussian random Laurent series g(z) has a Gaussian distribution at each z with an 
explicitly computable variance, so one can easily control the moments of (log'*' ]^(^)™)- As a consequence 
of these bounds, we can control perturbations to the test functions F that are small in the uniform norm. 



From Theorem 



1.4 



we know that the spectral radius of is at most 1+e with overwhelming probability. 

The tail event when the spectral radius exceeds 1 + e is thus negligible for the purposes of computing the 
asymptotics of the correlation functions and will thus be ignored. 



^For an extensive discussion of concentration inequalities, see |30| . Note that many other atom distributions also enjoy 
concentration inequalities, such as those distributions with the log-SoboIev property; we will not attempt to aim for maximal 
generality here. 
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Conditioning on the event that the spectral radius is at most 1 + e, we may then apply Lemma 2.1 to 



conclude that the eigenvalues of An in f2 are precisely the zeroes in Q of the random analytic function 

/(z) := 1 + ^y^(^{^Xn - Ziy^(pn,tpn'^ ■ 

Thus, up to errors of o(l), the correlation function p^' is equal on f2 to the correlation function of the zeroes 
of / (defined as in Q). 

As in previous sections, once the spectral radius is at most 1 + e, we can expand / as a convergent Neumann 
series 

where 

To control this expression properly, we will need to work instead with the truncated Neumann series 

for any J > I, where the remainder Rj is given by the formula 

Rj{z) ^{i^Xny+\^X„ ~ z/)-V„, V„). 

We now obtain a concentration bound on Rj{z): 

Lemma 5.2. Let A, J > 1. If n is sufficiently large (depending on A, J^e^k), then for each z with \z\ > l + 2e 
and all A > 0, one has 



P(|i?,7(z)| > AJ) < e-'^^' + n"^ 



for some c > depending only on e,p,k. 



Proof. Let mg be an integer such that (mo + 1) < (l + e)™". By Theorem 1.4 we see that with overwhelming 
probability, we have 

||(^X„)||op«l 

and 

ii(^x„)"°ii„p < (i+£r« 

V n 



and hence by Neumann series 

(recall that we allow implied constants to depend on e). Henceforth we condition on the above event. By 
another application of Theorem |1.4[ we see that with overwhelming probability, we have 

||(^x„)'^+i|Up« J 



and hence 



Wi^XnV+'i^Xn - z/)-Vnll « J- 

\ n \ n 



Note that the random vector is independent of X„. The claim then follows from the Azuma-Hoeffding 
inequality. □ 



Meanwhile, we have the following law for the fn.f- 
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Lemma 5.3. Let J > 1. As n — > cxi, the random variables /n.Oi ■ • ■ i fn,j converge jointly in distribution to 
J + 1 iid copies go, ■ ■ . ,gj of the real normal distribution N{Q, 1)k- 



Proof. By the central limit theorem, /„^o ~ \/n{(j)m ipn) converges in distribution to N(0, Now freeze ipn 
and thus fnfi- By the law of large numbers, tpn has a norm of l + o(l) with probability 1 — o(l). Conditioning 



on this event, we see from Proposition 4.1 that fn,i, ■ • ■ , fn.j converge in distribution to gi, gj. Integrating 



out the conditioning on we obtain the claim. □ 

In view of this proposition and the Skorokhod representation theorem, we may thus find iid copies go, gi, g2, ■ ■ 
of the real normal distribution iV(0, 1)r (depending on n) that are coupled to the /„j- in such a way that 

(21) sup |/„., -.9,1 =o(l) 

0<j<J 

uniformly with probability 1 — o(l), for each J > 1. We thus have 
uniformly with probability 1 — o(l), for any fixed J. 



Next, we introduce the function 



(k) 

The correlation functions poo are the correlation functions of the zeroes of g. It thus suffices to show that 
the correlation functions of the zeroes of / converge in the vague topology to the correlation functions of the 
zeroes of g. 

For any fixed z in fi, the tail mX^^^j+i z^- '^^ 3^^) Gaussian with mean zero and variance 0(|z|^^'^^^). 



It thus obeys the same tail bound as Lemma 5.2 (indeed it obeys slightly better bounds). From this. Lemma 



5.2 and (21) we conclude that 

P(|/(^) - 3(^)1 > AJ/lz^+i) « e-^'^' + 0(1) 

for any A > 1 and J > 1, where c' > is an absolute constant and the decay rate o(l) can depend on z. 
Letting J — > oo, we conclude that f[z) — g{z) converges in probability to zero for any fixed z S 17. 

Given any smooth compactly supported function : f2 — > C, define the random variables 

wGAf 

and 

where Af,Ag are the zeroes of f,g respectively (counting multiplicity). By the Stone- Weierstrass theorem 



(using (19), (20) to control errors that are small in the uniform norm), it suffices to show that 

EFi(A/) . ..Fk{Af) = ^F,{Ag) . . . Fk{Ag) + o{l) 

for all smooth compactly supported Fi,...,Fk : ft — >■ C. From part (i) of Theorem |1.11[ we know that 
_F'i(A^) . . . Fk{Af) is uniformly integrable in n (indeed, it has bounded norm for each m). Thus it suffices 
to show that Fi{Af) . . . Fk{Af) — Fi(Ag) . . . Fk{Ag) converges in probability to zero. By another appeal to 
Theorem |l.ll| [^i), it suffices to show that Fj{Af) — Fj{Ag) converges in probability to zero for each j. 
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Fix j, and write F for Fj. By Green's theorem, we can write 

PiM-i- f{AFiz))log\fiz)\d'z 



where A := + -§-2 is the usual Laplacian. Similarly for F{Kg). Thus it suffices to show that 



27r 

[ (A^^(z))(log|/(z)|-log|g(z)|) d^z 

converges in probability to zero. 

We already know that for fixed z, f{z) — g{z) converges in probability to zero. The function AF is bounded 
and compactly supported. Also, by Lemma To conclude the claim it then suffices by a truncation argument 
(cf. |44[ Lemma 3.1]) to obtain the uniform integrability bounds 



E / \\og\f{z)\\' + \\og\g{z)\\' d'z^Oil) 

J supp(F) 



But the bound for / follows from (15); the bound for g can be deduced from / by a Fatou lemma type 



argument. The proof of Theorem 1 1.11 is now complete 



6. Zero row sum 

We now prove Theorem |1.13[ We begin by proving the spectral radius upper bound 

p(^X„P„) < 1 + 0(1) 

which holds almost surely. It will suffice to show that almost surely one has 

||(^X„F„)"llop <0(™°^'^)+o(l) 



for each m > 1. Writing P„ = 1 — (1 — P„), expanding, and applying Theorem 1.4 and the fact that the 
operator norm forms a Banach algebra, it then suffices to show that 

1 

almost surely for each fixed j, as this handles all but the 0{m) terms in the expansion that involve at most 
one factor of 1 — P„, each of which is O(m^) at worst by Theorem 1.4 But this bound follows from Lemma 



11(1 - F„)(^X„)^(1 - Pn)\\op = 0(1) 



The spectral radius lower bound will follow from the circular law claim. Since almost sure convergence im- 
plies convergence in probability by the dominated convergence theorem, it will suffice to show that /i p 
and have the same almost sure limit. Applying the replacement principle ([44, Theorem 2.1]), it 

suffices to show that for almost every complex number z, one has 

(22) - log I det(^X„F„ - z)| - i log I det(^X„ - z)\ 

converges almost surely to zero. 
Fix z; we may take z to be non-zero. We allow implied constants in the 0() notation to depend on z. We 



can rewrite ( 22 ) as 



(23) i ^ logi diy'^M ' logi dvr.{t) 

where v'^, are the ESDs of (^X„_P„ - z)(^X„P„ - z)* and (^X„ - z)(^X„ - z)* respectively. 
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The matrix ^X„P„ — z is a rank one perturbation of ^X„ — z, and so the singular values ai{^^XnPn — z) 
of the former interlace that of the latter in the sense that 



(24) 



fJi-ii^Xn - z)> ai{-^XnPn ^ 2) > (Ti+i{^XnPn ^ z) 

\ n \ n \ n 



whenever i is such that the expressions are well-defined (i.e. 1 < i < n for the first inequality and 1 < i < n 
for the second). This adequately controls all of the singular values of ^X„P„ — z except for the smallest 
and largest. But from Theorem 1.4 we know that the largest singular value of both matrices are 0(1). From 
[44l Lemma 4.1] we almost surely also have a lower bound 

a„(^X„-z)»n-oW 

for all sufficiently large n. So if we can also obtain the corresponding bound 

(25) a„(^X„P„-z)»n-o(i) 

almost surely for all sufficiently large n, then by the alternating series teslj^we see that 

l + |logn-c(i)| 



(231 = O 



o(l) 



as required. So it will suffice to establish the least singular value bound (25 1. By the Borel-Cantelli lemma, 
it will suffice to show that 

P(a„(^X„P„ -z)< n-C") = 0(„-2) 
(say) for all sufficiently large n, and some absolute constant C . Taking transposes, it suffices to show that 



P(a„(- 



z) < n-^) = 0(n-2). 



Let C be chosen later. In order for the above event to hold, there must exist a unit vector v such that 

„ 1 



-.PnX„v — zv\\ < n ^ . 
In 



We now work to eliminate the role of the projection P„ by dropping a dimension. Taking inner products 
with (/)„, we see that 

z\(y,(\>n)\ < n-^. 
Since z is fixed and non-zero, we thus see that 

„_p„^; = 0(„-C + 0(l)) 

and thus 

^P„X„P„t; - zP„w = O(n-C'+O(i)). 
If we let v' := V ~ P„u/||u — Pni'll, we thus see that v' is orthogonal to 0„ and 

^P„X„t;'~zt;' = 0(n-^+O«) 

and thus 

-^Xnv' - zv' = a0„ + 0(n-'^+o(i)) 

for some complex number a. Writing 

v' := {v[,...,v'^_^,-v[ -...-«;) 



^Morc precisely, the integrals Jp°°logt dun(t) and \ogt du'„{t) are both averages of n increasing quantities of size 
0(1 -I- I log n~'^'^) I); by the interlacing property |24| (which bounds the even terms in the latter average by the odd terms in 
the former, and vice versa), the difference between these two averages can be rearranged as an average of two alternating series 
whose terms are increasing in magnitude. 
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we thus have 

n—l ^ 



3 = 1 ^ 

for alH = 1, . . . , n. Subtracting off the i ~ n equation to eliminate a, we conclude that 

n— 1 ^ 

for all z = 1, . . . , n — 1. Since v' has unit norm, {v'l, . . . , has norm between 1 and l/2y/n, and we 

conclude that 

(26) ^„_,(^X„_i + - 0(n-^+o(i)) 

where X„_i, Dn-i are the n — 1 x n — 1 matrices 

^n— 1 • — (•^■ij ) l<i,j?'<n— 1 

and ^ 

-^n— 1 • ( ( -^m -^nj ~^ •^nn) ^^zj' ~i~ ^)l<z,j<n — 1- 

If we condition ai^n, Xj„, x„„ to be fixed, then is deterministic, while Xn-i remains an iid random 

matrix. Applying [331 Lemma 4.1], [1^ Theorem 2.1], or [131 Theorem 4.1], we see that the conditional 
probabihty of (26) is 0(n~^) if C is large enough, and if the Xin,Xnj,Xnn are bounded by (say) n^™ in 
magnitude. Integrating out the conditioning (and using Chebyshev's inequality and the union bound to 
handle the rare event when one of the entries Xin,Xnj,Xnn is larger than n^'^*' in magnitude) we obtain the 
claim. This concludes the proof of Theorem |1.13| 
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